Improved Sequential Pattern Mining Using an Extended Bitmap Representation

نویسندگان

  • Chien-Liang Wu
  • Jia-Ling Koh
  • Pao-Ying An
چکیده

The main challenge of mining sequential patterns is the high processing cost of support counting for large amount of candidate patterns. For solving this problem, SPAM algorithm was proposed in SIGKDD’2002, which utilized a depth-first traversal on the search space combined with a vertical bitmap representation to provide efficient support counting. According to its experimental results, SPAM outperformed the previous works SPADE and PrefixSpan algorithms on large datasets. However, the SPAM algorithm is efficient under the assumption that a huge amount of main memory is available such that its practicability is in question. In this paper, an Improved-version of SPAM algorithm, called I-SPAM, is proposed. By extending the structures of data representation, several heuristic mechanisms are proposed to speed up the efficiency of support counting further. Moreover, the required memory size for storing temporal data during mining process of our method is less than the one needed by SPAM. The experimental results show that I-SPAM can achieve the same magnitude efficiency and even better than SPAM on execution time under about half the maximum memory requirement of SPAM.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TKS: Efficient Mining of Top-K Sequential Patterns

Sequential pattern mining is a well-studied data mining task with wide applications. However, fine-tuning the minsup parameter of sequential pattern mining algorithms to generate enough patterns is difficult and timeconsuming. To address this issue, the task of top-k sequential pattern mining has been defined, where k is the number of sequential patterns to be found, and is set by the user. In ...

متن کامل

Incremental Mining of Across-streams Sequential Patterns in Multiple Data Streams

Sequential pattern mining is the mining of data sequences for frequent sequential patterns with time sequence, which has a wide application. Data streams are streams of data that arrive at high speed. Due to the limitation of memory capacity and the need of real-time mining, the results of mining need to be updated in real time. Multiple data streams are the simultaneous arrival of a plurality ...

متن کامل

A Framework for Mining Closed Sequential Patterns

Sequential pattern mining algorithms developed so far provide better performance for short sequences but are inefficient at mining long sequences, since long sequences generate a large number of frequent subsequences. To efficiently mine long sequences, closed sequential pattern mining algorithms have been developed. These algorithms mine closed sequential patterns which don’t have any super se...

متن کامل

An Advanced Model for Mining Time Interval Sequential Patterns in Stream data

Mainly existing sequential pattern mining algorithms are hard to find out long significant time-interval sequential patterns in information stream. In this paper, we propose a new bitmap-based algorithm of mining d time-interval sequential pattern in information stream called DSBMMS, which is based on binary bit counting and multiple time-interval sequential position. We transform the whole seq...

متن کامل

Efficient Sequential Pattern Mining Algorithms

Sequential pattern mining is a heavily researched area in the field of data mining with wide variety of applications. The task of discovering frequent sequences is challenging, because the algorithm needs to process a combinatorially explosive number of possible sequences. Most of the methods dealing with the sequential pattern mining problem are based on the approach of the traditional task of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005